Constraints on Parallelism Beyond 10 Instructions Per Cycle
نویسندگان
چکیده
The problem of extracting InstructionLevel Parallelism at levels of 10 instructionsper clock and higher is considered. Two different architectures which use speculation on memory accesses to achieve this level of performance are reviewed. It is pointed out that while this form of speculation gives high potential parallelism it is necessary to retain execution state so that incorrect speculation can be detected and subsequently squashed. Simulation results show that the space to store such state is a critical resource in obtaining good speedup. To make good use of the space it is essential that state be stored efficiently and that it be retired as soon as possible. A number of techniques for extracting the best usage from the available state storage are introduced.
منابع مشابه
Effects of Re-ordered Memory Operations on Parallelism
The performance effect of permitting different memory operations to be re-ordered is examined. The available parallelism is computed using a machine code simulator. A range of possible restrictions on the re-ordering of memory operations is considered: from the purely sequential case where no re-ordering is permitted; to the completely permissive one where memory operations may occur in any ord...
متن کاملPerformance Measures of Superscalar Processor
In this paper the author describes about superscalar processor and its architecture. A superscalar architecture is one in which several instructions can be initiated simultaneously and executed independently. pipelining allows several instructions to be executed at the same time, but they have to be in different pipeline stages at a given moment. Superscalar architectures include all features o...
متن کاملOptimum Instruction-level Parallelism (ILP) for Superscalar and VLIW Processors
Modern superscalar and VLIW processors fetch, decode, issue, execute, and retire multiple instructions per cycle. By taking advantage of instruction-level parallelism (ILP), processor performance can be improved substantially. However, increasing the level of ILP may eventually result in diminishing and negative returns due to control and data dependencies among subsequent instructions as well ...
متن کاملA case for merging the ILP and DLP paradigms
The goal of this paper is to show that instruction level parallelism (ILP) and data-level parallelism (DLP) can be merged in a stngle architecture to ezecute vectorizable code at a performance level that can not be achieved using either paradigm on its own. We will show that the combination of the two techniques yields very high performance at a low cost and a low complexity. We will show that ...
متن کاملDesign and Validation of a Simultaneous Multi-Threaded DLX Processor
| Modern day computer systems rely on two forms of parallelism to achieve high performance, parallelism between individual instructions of a program (ILP) and parallelism between individual threads (TLP). Superscalar processors exploit ILP by issuing several instructions per clock, and multiprocessors (MP) exploit TLP by running di erent threads in parallel on di erent processors. A fundamental...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997